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SHORT REPORT 



Fine mapping genetic determinants of the highly 
variably expressed MHC gene ZFP57 

Katharine Plant\ Benjamin P Fairfax\ Seiko Makino\ Claire Vandiedonck^'^'^, Jayachandran Radhakrishnan^ 
and Julian C Knight"^' ^ 

ZFP57 is an important transcriptional regulator involved in DNA methylation and genomic imprinting during development. 
Here we demonstrate that gene expression also occurs at a low level in adult peripheral blood cells and other tissues including 
the kidney and thymus, but is critically dependent on underlying local genetic variation within the MHC. We resolve a highly 
significant expression quantitative trait locus for ZFP57 involving single-nucleotide polymorphisms (SNPs) in the first intron 
of the gene co-localizing with a DNase I hypersensitive site and evidence of CTCF recruitment. These data identify ZFP57 as 
a candidate gene underlying reported MHC disease associations, notably for putative regulatory variants associated with cancer 
and HIV-1. The work highlights the role that ZFP57 may play in DNA methylation and epigenetic regulation beyond early 
development into adult life dependent on genetic background, with important potential implications for disease. 
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INTRODUCTION 

ZFP57 is a Kruppel- associated box (KRAB) containing zinc-finger 
protein, preferentially expressed early in development.^ ZFP57 has 
been shown in the mouse to act as a transcriptional regulator and is 
important in maintenance of imprinting,^'^ regulating chromatin 
modifications and DNA methylation at murine imprinted loci in ES 
cells with the co-factor KRAB- associated protein 1 (KAP1/TRIM28).'^ 
ZFP57 loss-of-function mutations are known to cause a 
hypomethylation disorder presenting as transient neonatal diabetes 
(TND) associated with a unique epigenetic profile at the TND 
differentially methylated region and other imprinted loci such as 
GRBIO and PEG3,^ but the biology of ZFP57 in humans is not 
well characterized. Recently we showed that the expression of ZFP57 
was dependent on underlying genetic variation.^ Given the location of 
ZFP57 in the MHC class I region, we sought to resolve the association 
and investigate the relationship with disease. 

MATERIALS AND METHODS 

Volunteer recruitment, cell purification, cell culture, RACE, genotyping, 
imputation, eQTL mapping and relationship with reported GWAS were 
performed as detailed in Supplementary Information, 

RESULTS 

We aimed to define the genetic modulators of ZFP57 transcription by 
expression quantitative trait (eQTL) mapping. Alternatively spliced 
isoforms are weU characterized for murine Zfp57^ and a number of 
isoforms are annotated in humans (Supplementary Figure SI). In 
order to take account of this when quantif)^ing ZFP57 expression, we 
first characterized transcription in lymphoblastoid ceU lines (LCLs) 
and peripheral blood mononuclear cells (PBMCs) from volunteers 
identified as expressing ZFP57. Rapid amplification of cDNA ends 



(RACE) using 3' and 5' adapted cDNA from the COX LCL, known to 
be a high expresser of ZFP57,^ and PCR with exon spanning primers 
revealed a previously unrecognized isoform, in which exon 2 is 
skipped and predicted to have a significantly truncated KRAB domain 
(Supplementary Figure SI). Quantification of ZFP57 using isoform- 
specific primers or primers spanning exons 3/4 to capture both 
isoforms revealed low but detectable expression in PBMCs, ES cells 
and several adult tissues, notably the thymus and kidney 
(Supplementary Figure SI). Relative abundance of the different 
isoforms remained consistent between different tissues and across 
individuals (Supplementary Figure SI). 

We proceeded to eQTL mapping in a cohort of 288 healthy 
volunteers^ using primers spanning exons 3/4 to quantify transcript 
abundance in PBMCs. Following processing and quaHty control 
filtering, we analysed 651210 SNP markers for 283 individuals. This 
revealed a major eQTL for ZFP57 with the most significant associated 
SNP rs375984 (P= 9.3 x lO'^^) in the second intron of ZFP57 
(Figure la and b). Analysis of purified monocytes from the same 
volunteers confirmed a strong eQTL, the most significant association 
was to rs375984 {P=3.2 x lO'^^; Supplementary Figure S2). To 
fiarther resolve this, we imputed 19 129 additional SNPs within 
250 kb, which revealed three more strongly associated variants in 
the first intron of ZFP57 in perfect LD (rs416568, rs365052 and 
rs2747431, P = 4.6x 10 "^2; Figure Ic and d). We determined the 
functional genomic landscape for these eSNPs using data from the 
ENCODE project.^ Analysis of DNase-seq and ChlP-seq data sets 
resolved rs365052 is a candidate regulatory variant located in a DNase 
I hypersensitive site with evidence of CTCF binding (Figure Id). 

Analysis of the eQTL by HLA type showed association with 
HLA-A'^^Ol and "^^23, but was not more informative than SNP 
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Figure 1 Genetic modulators of ZFP57 expression, (a) Manhattan plot showing strength of association plotted as -loglO(P) values by chromosome for 
ZFP57 expression, (b) Scatter/box and whiskers plot of ZFP57 expression by rs375984 allele demonstrating significant differences between the different 
genotype groups (P<0.0001, Kruskal-Wallis test), (c) Local association and recombination plot. Single marker allelic association results for a 215 kb region 
spanning ZFP57 plotted as -loglO(P) values (left y-axis) by genomic coordinate (x-axis). With reference to rs2747431 (which is in complete LD with 
rs416568 and rs365052), typed SNPs are shown in red {i^>0.8), orange (0.5-0.8), yellow (0.2-0.5) and white (<0.2). Imputed SNPs are shown in grey. 
Recombination rate is also plotted (right y-axis). (d) Functional genomic landscape for ZFP57 (chr6:29640242-29650866) providing context for observed 
eSNPs, including rs375984, rs416568, rs365052 and rs2747431. Data are shown from the ENCODE project, accessed through the UCSC Genome 
Browser (http://genome.ucsc.edu/), resolving a DNase I hypersensitive site and evidence of CTCF binding in the region of rs365052 based on profiling of ES 
cells, LCLs (GM12878, GM12891) and CD20+ B cells. Linkage disequilibrium plot for the locus based on is shown below including 115 SNPs (1000 
Genomes CEU phase 1). 



markers (Figure 2). For the two most common ancestral 
haplotypes among Europeans, we found that volunteers with a 
copy of HLA-A1-B8-DR3 (n=19) had higher expression of 
ZFP57 compared to HLA-A3-B7-DR15 (n = 12; Mann-Whitney, 
P<0.0001). 

We investigated whether the genetic variants identified here as 
associated with ZFP57 expression may be significant in common 
disease given the many disease associations reported involving the 



MHC class I region. We interrogated GWAS data sets and found 
intersection of ZFP57 eSNPs variants with reported disease 
associations involving malignancy, HIV/ AIDS and autoimmunity 
(Table 1). These included nasopharyngeal carcinoma and prostate 
cancer, the latter involving disease risk based on gene-gene 
interaction with the tumour suppressor gene NKX3-1. Associations 
were also noted involving HIV-1 viral set point and disease 
progression to AIDS. 
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Figure 2 Association between ZFP57 expression, eSNPs and classical HLA types. HLA-A, HLA-B, HLA-C, HLA-DRB, HLA-DQA and HLA-DQB shovjn at two- 
digit resolution with ZFP57 expression in PBMCs quantified by qPCR. Expression values are plotted for each individual corresponding to each HLA allele 
and coloured based on rs2747431 genotype (individuals with CC genotype at rs2747431 shown in red, CT in green and TT in blue). Two ZFP57 expression 
values are plotted for each individual corresponding to each allele. There was evidence of association for HLA-A*01 and HLA-A*23 alleles (P< 0.0001 
when analysed using a Mann-Whitney test). 



DISCUSSION 

We found that a significant minority of people show low-level 
transcription of ZFP57 in adult cells and tissues, where it may 
modulate epigenetic processes, and that this is dependent on a strong 
local eQTL for ZFP57. Further work is required to resolve the 
functional basis for this, but a potential mechanism involves 
modulation of a novel regulatory element involving CTCF binding 
in the first intron of ZFP57. Our data also highlighted a potential role 
for ZFP57 eSNPs in traits including cancer and HIV/AIDS. KRAB-ZNF 
genes play a role in epigenetic processes critical to cancer, 
including silencing of tumour suppressor genes, while the co -factor 
KAPl is involved in oncogenesis.^ DNA methylation is involved in 
establishing latency by retroviruses with hypermethylation of the viral 
5' long terminal repeat characteristic of HFV-l aviraemic patients. 



To date in humans, ZFP57 has only been associated with maintenance 
and not establishment of DNA methylation,^ though a role in de novo 
methylation has been reported in mice.^ The identification of a novel 
shorter isoform of ZFP57 may be functionally significant given the 
resulting severely truncated KRAB domain, which is likely to limit 
interaction with KAPl. 

Complex LD structure in the MHC together with differences in 
SNP coverage between genotyping platforms necessitates further work 
to establish whether the most significant ZFP57 eSNPs are also most 
informative for disease association and to explore the biological 
significance and relative importance of this observation in the context 
of extensive haplotype- specific expression in the MHC for other 
genes^ with multiple cis- and trans-cQTL identified for this genomic 
region.^ 
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Table 1 Diseases and traits from the Catalogue of Published Genome-Wide Association Studies (www.genome.gov/gwastudies, Accessed May 
2012) in which reported GWAS SNPs are also eSNPs for ZFP57 expression 



Disease/trait 


PUBIVIED ID 


First autlior 


SNP 


GWAS P-value 


eQTL P-value 


r^ with peak eSNP 


Cancer 














Nasopharyngeal carcinoma 


19664746 


Tse KP 


rs3129055 


7 Ax 10-11 


5.4 X 10-39 


0.61 


Nasopharyngeal carcinoma 


20512145 


Bei JX 


rs2860580 


4.9 X 10-^^ 


7.0 X 10-07 


0.22 


Prostate cancer (gene x 10 gene interaction) 


22219177 


Tao S 


rs2523395 


1.5 X 10-°^ 


1.3 X 10-13 


0.35 


Autoimmune disease 














Graves' disease 


21900946 


Nakabayashi K 


rs3893464 


1.9 X 10-20 


1.1 X 10-18 


0.47 


Multiple sclerosis 




De Jager PL 


rszozooyo 


1 n^xin— 17 
i .U X iU 


1 Q.. in — 13 
i.o X iU 


U.oO 


HIV/AIDS 














AIDS progression 


19115949 


Limou S 


rs8321 


4.7 X 10-0^ 


5.5 X 10-14 


0.30 


HIV-1 control 


20041166 


Pel lay J 


rs259919 


3.0 X 10-0^ 


9.8 X 10-14 


0.22 


Ottier 














Drug-induced liver injury (amoxicillin-clavulanate) 


21570397 


Lucena Ml 


rs2523822 


1.8x 10-10 


4.6 X 10-07 


0.21 


IgE levels 


22075330 


Granada M 


rs2571391 


1.2x 10-15 


1.8 X 10-06 


0.23 



1^ between these SNPs and the peak ZFP57 eQTL, tagged by rs2747431, is shown. 



We have presented evidence that ZFP57 is expressed more widely 
than previously appreciated, notably beyond development, and that 
this is dependent on underlying genetic variation. Further work is 
needed to investigate the role of ZFP57 in epigenetic regulation, 
notably in terms of cancer and HIV infection where expression 
associated SNPs may play a role and epigenetic mechanisms are 
known to be important. 
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